Skip to content

ML-Evaluation of Classifiers

Evaluation Criteria

Predictive accuracy:

Accuracy=Number of correct classificationTotal Number of test cases

Efficiency

  • Time to construct the model
  • Time to use the model

Robustness: handling noise and missing values Scalability: efficiency in disk-resident databases Interpretability: understandable and insight provided by the model Compactness of the model: size of the tree, or the number of rules.

**Precision and recall measures **

Confusion Matrix

Classified PositiveClassified Negative
Actual PositiveTPFN
Actual NegativeFPTN
precision=TPTP+FPrecall=TPTP+FNF1=2×precisionprecision+recall

Roc Curve

True positive rate:

TPR=TPTP+FN

False positive rate: / True Negative Rate

FPR=FPTN+FP

How to compare 2 curves? Compute the area under the curve (AUC)

If AUC for Ci is greater than that of Cj, it is said that Ci is better than Cj. If a classifier is perfect, its AUC value is 1 If a classifier makes all random guesses, its AUC value is 0.5.

Evaluation Methods

Holdout set: The available data set D is divided into two disjoint subsets,

  • the training set Dtrain (for learning a model)
  • the test set Dtest (for testing the model)

Important: training set should not be used in testing and the test set should not be used in learning.

  • Unseen test set provides a unbiased estimate of accuracy.

The test set is also called the holdout set. (the examples in the original data set D are all labeled with classes.)

This method is mainly used when the data set D is large.

n-fold cross-validation:

The available data is partitioned into n equal-size disjoint subsets. Use each subset as the test set and combine the rest n1 subsets as the training set to learn a classifier.

The procedure is run n times, which give n accuracies.

The final estimated accuracy of learning is the average of the n accuracies.

10-fold and 5-fold cross-validations are commonly used.

This method is used when the available data is not large.